Matrix Multiplication Specialization in STAPL
نویسندگان
چکیده
The Standard Template Adaptive Parallel Library (STAPL) is a superset of C++’s Standard Template Library (STL) which allows highproductivity parallel programming in both distributed and shared memory environments. This framework provides parallel equivalents of STL containers and algorithms enabling ease of development for parallel systems. In this paper, we will discuss our methodology for implementing a fast and efficient matrix multiplication algorithm in STAPL. Our implementation employs external linear algebra libraries, specifically the Basic Linear Algebra Subprograms (BLAS) library which includes highly optimized sequential matrix operations. The paper will describe the benefits of creating a parallel matrix multiplication algorithm whose library calls are specialized based on both the matrix storage and traversal. This specialization technique ensures that the most appropriate implementation in terms of data access and structure will be used, resulting in increased efficiency compared to a non-specialized approach.
منابع مشابه
Optimization of Sparse Matrix-Vector Multiplication by Specialization
Program specialization is the process of generating optimized programs based on available inputs. It is particularly applicable when some input data are used repeatedly while other input data vary. Specialization can be employed at compile-time as well as at run-time, depending on when the inputs become available. In this paper we explore the potential for obtaining speed-ups for sparse matrix-...
متن کاملOptimization by Run-time Specialization for Sparse Matrix-Vector Multiplication (Submitted for publication)
Run-time specialization is the process of generating programs based on information available only at run time. This technique has the potential of generating highly efficient codes, at the expense of the overheads of the run-time code generation. It is applicable when some input data is used repeatedly while other input data varies. In this paper we explore the potential for obtaining speedups ...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملBacon: A GPU Programming Language With Just in Time Specialization (Draft)
This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly more convenien...
متن کاملBacon: A GPU Programming System With Just in Time Specialization
This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly more convenien...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008